Stochastic Neighbor Compression

نویسندگان

  • Matt J. Kusner
  • Stephen Tyree
  • Kilian Q. Weinberger
  • Kunal Agrawal
چکیده

We present Stochastic Neighbor Compression (SNC), an algorithm to compress a dataset for the purpose of k-nearest neighbor (kNN) classification. Given training data, SNC learns a much smaller synthetic data set, that minimizes the stochastic 1-nearest neighbor classification error on the training data. This approach has several appealing properties: due to its small size, the compressed set speeds up kNN testing drastically (up to several orders of magnitude, in our experiments); it makes the kNN classifier substantially more robust to label noise; on 4 of 7 data sets it yields lower test error than kNN on the entire training set, even at compression ratios as low as 2%; finally, the SNC compression leads to impressive speed ups over kNN even when kNN and SNC are both used with ball-tree data structures, hashing, and LMNN dimensionality reduction—demonstrating that it is complementary to existing state-of-the-art algorithms to speed up kNN classification and leads to substantial further improvements.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neural Codes for Image Retrieval

This seminar report focuses on using convolutional neural networks for image retrieval. Firstly, we give a thorough discussion of several state-of-the-art techniques in image retrieval by considering the associated subproblems: image description, descriptor compression, nearest-neighbor search and query expansion. We discuss both the aggregation of local descriptors using clustering and metric ...

متن کامل

Accelerating Fractal Image Compression by Multi-Dimensional Nearest Neighbor Search

In fractal image compression the encoding step is computationally expensive. A large number of sequential searches through a list of domains (portions of the image) are carried out while trying to find a best match for another image portion. Our theory developed here shows that this basic procedure of fractal image compression is equivalent to multi-dimensional nearest neighbor search. This res...

متن کامل

Stochastic neighbor embedding (SNE) for dimension reduction and visualization using arbitrary divergences

We present a systematic approach to the mathematical treatment of the t-distributed stochastic neighbor embedding (t-SNE) and the stochastic neighbor embedding (SNE) method. This allows an easy adaptation of the methods or exchange of their respective modules. In particular, the divergence which measures the difference between probability distributions in the original and the embedding space ca...

متن کامل

Adaptive approximate nearest neighbor search for fractal image compression

Fractal image encoding is a computationally intensive method of compression due to its need to find the best match between image subblocks by repeatedly searching a large virtual codebook constructed from the image under compression. One of the most innovative and promising approaches to speed up the encoding is to convert the range-domain block matching problem to a nearest neighbor search pro...

متن کامل

Breaking the Time Complexity of Fractal Image Compression

In fractal image compression the encoding step is computationally expensive. A large number of sequential searches through a list of domains (portions of the image) are carried out while trying to nd a best match for another image portion. We show that this step can be replaced by multi-dimensional nearest neighbor search which runs in logarithmic time instead of linear time required for the co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014